" Unit 3 - Lecture 1 "
"------------------------------------------------------------------------"

" 
Question:
Difference between Empirical and Theoretical?

Example: Tossing a coin

"

"------------------------------------------------------------------------"

" Working with Statistical Distributions "

" Key letter:
  d : PMF P(X = x) / PDF f(x)
  p : CDF = P[X <= x]
  q : Quantile
  r : Simulate a random sample "

"------------------------------------------------------------------------"

" Discrete Distributions "

"------------------------------------------------------------------------"

" 1.) Binomial Distribution "

" Say, X ~ Bin(n = 10,p = 0.5) "

" Calculate the value of PMF "
# From scratch
# P[X = x] = nCx * p^x * (1-p)^(n-x)

# P[X = 2]
x = 2 ; p = 0.5 ; n = 10
choose(n,x) * p^x * (1-p)^(n-x)

# Using function
# Syntax:
# dbinom(x,size,p)

# P[X = 2]
dbinom(x,n,p)


" Calculate the value of CDF "
# From scratch
# P[X <= 2] = P[X = 0] + P[X = 1] + P[X = 2]

sum(dbinom(c(0,1,2),n,p))


# Using function
# pbinom(x,size,p,lower.tail = )

# P[X <= 2]
pbinom(2,n,p)

# P[X > 2] = 1 - P[X <= 2]
1 - pbinom(2,n,p)
pbinom(2,n,p,lower.tail = F)



" Draw a random sample of size 100 from
  the above distribution "

# Using function
# rbinom(n,size,p)

X = rbinom(n = 864,
           size = n,
           p = 0.5)

max(X)

"
Note:
Here, the argument, n = <no.> indicates the
no. of data points to be simulated from 
the said distribution.
"

"
Question:
Why does the output for all of us
is different?
"


"

Explain:
1.) Use of set.seed(<no.>)
2.) Working of set.seed()

"

set.seed(10)
X = rbinom(1000,n,p)

Y = rbinom(1000,
           n,p)
head(X)
head(Y)

"
For the above random sample, find
the following:
1.) Empirical Mean v/s Theoretical Mean
2.) Empirical Var v/s Theoretical Var
3.) Empirical Coefficient Of Skewness
4.) Empirical Mode (Numerically)

"
set.seed(10)
X = rbinom(1000,n,p)

n * p ; mean(X)

# C.O.S = Mu.3 / (Mu.2^(3/2))

Mu.3 = mean((X - mean(X))^3)
Mu.2 = mean((X - mean(X))^2)

Mu.3 / (Mu.2^(3/2))


" Recall: "
" Raw Moments "
" m.r = E[x^(r)] "


" Central Moments "
" mu.r = E[(x - Mu)^(r)] "


# Empirical Mode
TAB = data.frame(table(X))
TAB$X[which.max(TAB$Freq)]

# Population Mode
X = seq(0,10)
PMF = dbinom(X,n,p)

X[which.max(PMF)]


"
Question:
Why do we see a difference between
Empirical and Theoretical results?

How can we ensure that the gap is minimised?
"

"------------------------------------------------------------------------"

" Exercise On Binomial Distribution "

" A box consists of 10 mobile phones which
  contains a defect. The probability that a
  phone can be repaired is 65%.

  Define:
  X ~ Bin(10,0.65)

  Find:
  1.) P(More than 4 phones are repaired)
  2.) Display the distribution of X.
  3.) Display and comment on how will the distribution 
      change for p = 10%, 25%, 50%, 
                     75%, 90%, 95%?
      
  4.) Display 3.) in one plot.

"
par(mfrow = c(2,3))


for(i in c(0.1,0.25,0.5,0.75,0.9,0.95)){
  
  X = 0:10 ; n = 10
  PMF = dbinom(X,n,i)
  
  plot(X,PMF,
       type = "h",
       ylab = "P[X = x]",
       main = paste("Distribution of Bin(10,",
                    i,")",sep = ""))
  
}

par(mfrow = c(1,1))


"------------------------------------------------------------------------"

" 2.) Poisson Distribution "

" Calculate the value of PMF "
# Syntax:
# dpois(x,lambda)

" Calculate the value of CDF "
# Syntax:
# ppois(x,lambda)


" Simulate data from Poisson Distribution "
# Syntax:
# rpois(n,lambda)


"------------------------------------------------------------------------"

" Exercise On Poisson Distribution "

" The claim count on a portfolio of Motor Insurance
  policies follow Poi(lambda = 0.75 / day) 

  Calculate:
  1.) P(More than 4 claims are reported in a day)
  
  2.) Draw a random sample of size 200 from the
      above distribution and find:
      
      a.) Empirical Mean and Variance
          using a seed of 45.
      
  3.) Draw a random sample of size 1,000 from the
      above distribution and find:
      
      a.) Empirical Mean and Variance
          using a seed of 76.
      
  4.) Compare your answers in 2.) and 3.)
  
  5.) Find Empirical Mode and compare it with
      Theoretical Mode. Comment.
    
  6.) Calculate the Empirical value, 
      that P[X >= 3]

"
lambda = 0.75

"1. "
ppois(4,lambda,lower.tail = F)


"2. "

set.seed(45)
X = rpois(200,lambda)
mean(X) ; var(X)


"3. "

set.seed(76)
X = rpois(1000,lambda)
mean(X) ; var(X)


"5. "

TAB = data.frame(table(X))
TAB$X[which.max(TAB$Freq)]

X = 0:20
PMF = dpois(X,lambda)

X[which.max(PMF)]


"6. "

length(X[X >= 3]) / length(X)
X[which(X >= 3)]


"------------------------------------------------------------------------"

" Prove the property that, If X ~ Poi(lambda),
  As lambda -> Inf
     then X .-.> Normal "

lambda = c(0.5,1,5,15,30,40)

X = 0:100

par(mfrow = c(2,3))

for(i in lambda){
  
  PMF = dpois(X,i)
  
  plot(X,
       PMF,
       type = "h",
       main = paste("X ~ Poi(",i,")",
                    sep = ""))
  
}



"------------------------------------------------------------------------"

"
Other Discrete Distributions:
- Geometric (Type II): igeom()
- Hypergeometric: ihyper()
- Negative Binomial (Type II): inbinom()

Where i = d, p, q, r

"

"------------------------------------------------------------------------"

"
Exam Question:

N ~ Type 1 NegBin(k = 4, p = 0.3)
(a) Simulate 100 values from the 
    above distribution, 
    using the seed as 188. 
    Show the 
    empirical inter-quartile range. (2)

(b) Compare your answers,
    to the population 
    inter - quartile range. (2)

(c) Without any further work, 
    how would answer in (a) change if 
    number of simulated values increased. (1)
"

